Statistical Science 

2011, Vol. 26, No. 3, 423-439 

DOI: 10.1214/11-STS357 

© Institute of Mathematical Statistics, 2011 

Weak Informativity and the Information 
in One Prior Relative to Another 

Michael Evans and Gun Ho Jang 



(N 

o 

(N 
C 



•4— > 



> 



o 

(N 



X 



Abstract. A question of some interest is how to characterize the amount 
of information that a prior puts into a statistical analysis. Rather than 
a general characterization, we provide an approach to characterizing 
the amount of information a prior puts into an analysis, when com- 
pared to another base prior. The base prior is considered to be the 
prior that best reflects the current available information. Our purpose 
then is to characterize priors that can be used as conservative inputs 
to an analysis relative to the base prior. The characterization that we 
provide is in terms of a priori measures of prior-data conflict. 

Key words and phrases: Weak informativity, prior-data conflict, in- 
formation, noninformativity. 



1. INTRODUCTION 

Suppose we have two proper priors IT and II2 on 
a parameter space for a statistical model {Pg : 9 G 
0}. A natural question to ask is: how do we com- 
pare the amount of information each of these priors 
puts into the problem? While there may seem to be 
natural intuitive ways to express this, such as prior 
variances, it seems difficult to characterize this pre- 
cisely in general. For example, the consideration of 
several examples in Sections 3 and 4 makes it clear 
that using the variance of the prior is not appropri- 
ate for this task. 

The motivation for this work comes from Gelman 
(2006) and Gelman et al. (2008), where the intu- 
itively satisfying notion of weakly informative priors 
is introduced as a compromise between informative 
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and noninformative priors. The basic idea is that we 
have a base prior IIi, perhaps elicited, that we be- 
lieve reflects our current information about 9, but 
we choose to be conservative in our inferences and 
select a prior II2 that puts less information into the 
analysis. While it is common to take II2 to be a non- 
informative prior, this can often produce difficulties 
when II2 is improper, and even when II2 is proper, 
it seems inappropriate, as it completely discards the 
information we have about 9 as expressed in ITi. 
In addition, we may find that a prior-data conflict 
exists with IIi and so look for another prior that re- 
flects at least some of the information that IIi puts 
into an analysis, but avoids the conflict. 

We note that our discussion here is only about 
how we should choose II2 given that II 1 has already 
been chosen. Of course, the choice of IT is of cen- 
tral importance in a Bayesian analysis. Ideally, IT 
is chosen based on a clearly justified elicitation pro- 
cess, but we know that this is often not the case. 
In such a circumstance it makes sense to try and 
choose IIi reasonably but then be deliberately less 
informative by choosing II2 to be weakly informative 
with respect to IT. The point is to inspire confidence 
that our analysis is not highly dependent on infor- 
mation that may be unreliable. To do this, however, 
requires a definition of what it means for one prior 
to be weakly informative with respect to another 
and that is what this paper is about. 
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To implement the idea of weak informativity, we 
need a precise definition. We provide this in Sec- 
tion 2 and note that it involves the notion of prior- 
data conflict. Intuitively, a prior-data conflict occurs 
when the prior places the bulk of its mass where 
the likelihood is relatively low, as the likelihood is 
indicating that the true value of the parameter is 
in the tails of the prior. Our definition of weak in- 
formativity is then expressed by saying that H2 is 
weakly informative relative to 111 whenever 1I2 pro- 
duces fewer prior-data conflicts a priori than IIi. 
This leads to a quantifiable expression of weak in- 
formativity that can be used to choose priors. In 
Section 3 we consider this definition in the context 
of several standard families of priors and it is seen 
to produce results that are intuitively reasonable. In 
Section 4 we consider applications of this concept 
in some data analysis problems. While our intuition 
about weak informativity is often borne out, we also 
find that in certain situations we have to be careful 
before calling a prior weakly informative. 

First, however, we establish some notation and 
then review how we check for prior-data conflict. We 
suppose that Pg{A) = f A fg(x)fi(dx), that is, each Pg 
is absolutely continuous with respect to a support 
measure fi on the sample space X, with the den- 
sity denoted by fg. With this formulation a prior II 
leads to a prior predictive probability measure on X 
given by M(A) = J e P e (A)U(d9) = J A m(x) fx(dx) , 
where m(x) = j & fg(x)U(d9) . If T is a minimal suffi- 
cient statistic for {Pg : 9 £ ©}, then it is well known 
that the posterior is the same whether we observe x 
or T(x). So we will denote the posterior by n(-|T) 
hereafter. Since T is minimal sufficient, we know 
that the conditional distribution of x given T is in- 
dependent of 9. We denote this conditional measure 
by P(-\T). The joint distribution Pg x II can then 
be factored as 

P e x n = M x U(-\x) 

(1) 

= P(-\T)xM T xU(-\T), 

where Mt is the marginal prior predictive distribu- 
tion of T. 

While much of Bayesian analysis focuses on the 
third factor in (1), there are also roles in a statistical 
analysis for P(-\T) and Mt- As discussed in Evans 
and Moshonov (2006, 2007), P(-\T) is available for 
checking the sampling model, for example, if x is 
a surprising value from this distribution, then we 
have evidence that the model {Pg : 9 6 G} is incor- 
rect. Furthermore, it is argued that, if we conclude 



that we have no evidence against the model, then 
the factor Mt is available for checking whether or 
not there is any prior-data conflict, and we do this 
by comparing the observed value of T(x) to Mt- If 
we have no evidence against the model, and no ev- 
idence of prior-data conflict, then we can proceed 
to inferences about 9. Actually, the issues involved 
in model checking and checking for prior-data con- 
flict are more involved than this (see, e.g., the cited 
references and Section 5), but (1) gives the basic 
idea that the full information, as expressed by the 
joint distribution of (6,x), splits into components, 
each of which is available for a specific purpose in 
a statistical analysis. 

Accordingly, we restrict ourselves here, for any 
discussions concerning prior-data conflict, to work- 
ing with Mt- One issue that needs to be addressed is 
how one is to compare the observed value to = T( x o) 
to Mt- In essence, we need a measure of surprise and 
for this we use a P- value. Effectively, we are in the 
situation where we have a value from a single fixed 
distribution and we need to specify the appropri- 
ate P- value to use. In Evans and Moshonov (2006, 
2007) the P- value for checking for prior-data conflict 
is given by 

(2) M T {m T {t) <mr(to)), 

where rriT is the density of Mt with respect to the 
volume measure on the range space for T. In Evans 
and Jang (2011) it is proved that, for many of the 
models and priors used in statistical analyses, (2) 
converges almost surely, as the amount of data in- 
creases, to H(tt(9) < 7r(#*)), where 6* is the true 
value of 9. So (2) is assessing to what extent the true 
value is in the tails of the prior, or, equivalently, to 
what extent the prior information is in conflict with 
how the data is being generated. 

A difficulty with (2) is that it is not generally in- 
variant to the choice of the minimal sufficient statis- 
tic T. A general invariant P-value is developed in 
Evans and Jang (2010) for situations where we want 
to compare the observed value of a statistic to a fixed 
distribution. This requires that the model and T 
satisfy some regularity conditions, for example, all 
spaces need to be locally Euclidean, support mea- 
sures are given by volume measures on these spaces, 
and T needs to be sufficiently smooth. A formal de- 
scription of these conditions can be found in Tjur 
(1974) and it is noted that these hold for the typical 
statistical application. For example, these conditions 
are immediately satisfied in the discrete case. Fur- 
thermore, for continuous situations, with densities 
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defined as limits, we get the usual expressions for 
densities. When applied to checking for prior-data 
conflict, this leads to using the invariant P- value 

(3) M T {m* T {t)<m* T {t )), 

where m^{t) = J T - lt m{x)fJ,T- 1 {t}{dx) = mr{t) ■ 
E{J^ 1 {x)\T{x) = t), ^t- 1 ^} is the volume measure 
on T- 1 ^}, J T {x) = (det(cff(x) o dT'{x)))~^ 2 and 
dT is the differential of T. Note that Jt(x) gives the 
volume distortion produced by T at x. So is the 
density of Mt with respect to the support measure 
given by {E{J^ l {x)\T{x) = i)}" 1 times the volume 
measure on the range space for T. 

In applications all models are effectively discrete, 
as we measure responses to some finite accuracy, and 
continuous models are viewed as being approxima- 
tions. The use of (3), rather than (2), then expresses 
the fact that we do not want volume distortions in- 
duced by a transformation to affect our inferences. 
So we allocate this effect of the transformation with 
the support measure, rather than with the density, 
when computing the P-value. In the discrete case, 
as well as when T is linear, (2) and (3) give the same 
value and otherwise seem to give very similar values. 
Convergence of (3), to an invariant P- value based on 
the prior, is established in Evans and Jang (2011). 
We use (3) throughout this paper but note that it 
is only in Section 3.3 where (3) differs from (2). 

Our discussion here is based on a minimal suf- 
ficient statistic T. We note that, except in math- 
ematically pathological situations, such a statistic 
exists. It may be, however, that T is high dimen- 
sional, for example, T can be of the same dimension 
as the data. In such situations the dimensionality 
of the problem can often be reduced by examining 
components of the prior in a hierarchical fashion. 
For example, when the prior on 6 = {61,62) is spec- 
ified as ir{6) = ^2(^21^1)^1(^1)1 then m and 7T2(-|#i) 
are checked separately and so the definition of weak 
informativity applies to each component separately. 
This is exemplified by the regression example of Sec- 
tion 4.2 where 6 = {61,62) = {P,a 2 ). More on check- 
ing the components of a prior can be found in Evans 
and Moshonov (2006). Furthermore, when ancillar- 
ies exist, it is necessary to condition on these when 
checking for prior-data conflict, as this variation has 
nothing to do with the prior. This results in a reduc- 
tion of the dimension of the problem. The relevance 
of ancillarity to the problem of weakly informative 
priors is discussed in Section 5. 



When choosing a prior it makes sense to consider 
the prior distribution of more than just the minimal 
sufficient statistic. For example, Chib and Ergashev 
(2009) consider the prior distribution of a some- 
what complicated function of the parameters and 
data that has a real world interpretation. If this dis- 
tribution produces values that seem reasonable in 
light of what is known, then this goes some distance 
toward justifying the prior. Also, the level of infor- 
mativity of the prior can be judged by looking at the 
prior distribution of this quantity when that is pos- 
sible. While this is certainly a reasonable approach 
to choosing IIi, it does not supply us with a defini- 
tion of weak informativity. For example, a prior TIi 
can be chosen as discussed in Chib and Ergashev 
(2009), but then II2 could be chosen to be weakly 
informative with respect to JTi , to inspire confidence 
that conclusions drawn are not highly dependent on 
subjective appraisals. 

As we will show, there will typically be many pri- 
ors II2 that are weakly informative with respect to 
a given base prior IIi . The question then arises as to 
which II2 we should use. This is partially answered 
in Section 2 where we show that the definition of 
weak informativity leads to a quantification of how 
much less informative II2 is than IIi. For example, 
we can choose H2 in a family of priors to be 50% 
less informative than IIi . Still, there may be many 
such II2 and at this time we do not have a criterion 
that allows us to distinguish among such priors. For 
example, suppose the base prior is a normal prior 
for a location parameter. We can derive weakly in- 
formative priors with respect to such a prior in the 
family of normal priors (see Section 3.1) or in the 
family of t priors (see Section 3.2). There is nothing 
in our developments that suggests that a weakly in- 
formative t prior is to be preferred to a weakly infor- 
mative normal prior or conversely. Such distinctions 
will have to be made based on other criteria. 

2. COMPARING PRIORS 

There are a variety of measures of information used 
in statistics. Several measures have been based on the 
concept of entropy, for example, see Lindley (1956) 
and Bernardo (1979). While these measures have 
their virtues, we note that their coding theory inter- 
pretations can seem somewhat abstract in statisti- 
cal contexts and they can suffer from nonexistence in 
certain problems. Also, Kass and Wasserman (1995) 
contain some discussion concerned with expressing 
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the absolute information content of a prior in terms 
of additional sample values. Rather than adopting 
these approaches, we consider comparing priors ba- 
sed on their tendencies to produce prior-data con- 
flicts. This formulation of the relative amount of in- 
formation put into an analysis has a direct interpre- 
tation in terms of statistical consequences. 

Suppose that an analyst has in mind a prior IIi 
that they believe represents the information at hand 
concerning 8. The analyst, however, prefers to use 
a prior II2 that is conservative, when compared to IIi- 
In such a situation it seems reasonable to consider II 1 
as a base prior and then compare all other priors to 
it. This idea comes from Gelman (2006) and leads 
to the notion of weakly informative priors. 

Before we observe data we have no way of know- 
ing if we will have a prior-data conflict. Accordingly, 
since the analyst has determined that IT best re- 
flects the available information, it is reasonable to 
consider the prior distribution of Pi (to) = 
Mi T (m* T (t) < m* T (to)) when t ~ M 1T . Of course, 
this is effectively uniformly distributed [exactly so 
when m\ T (t) has a continuous distribution when 
t ~ M\t\ and this expresses the fact that all the 
information about assessing whether or not a prior- 
data conflict exists is contained in the P-value, with 
no need to compare the P-value to its distribution. 

Consider now, however, the distribution of P2(to) = 
M2T(m2 T (t) < rri2 T (to)) which is used to check whe- 
ther or not there is prior-data conflict with respect 
to H2 . Given that we have identified that a priori the 
appropriate distribution of to is M\t , at least for in- 
ferences about an unobserved value, then P2(to) is 
not uniformly distributed. In fact, from the distribu- 
tion of P2(to) we can obtain an intuitively reasonable 
idea of what it means for a prior H2 to be weakly 
informative relative to IIx- Suppose that the prior 
distribution of P2(to) clusters around 1. This im- 
plies that, if we were to use II2 as the prior when IIi 
is appropriate, then there is a small prior probabil- 
ity that a prior-data conflict would arise. Similarly, 
if the prior distribution of p2(to) clusters around 0, 
then there is a large prior probability that a prior- 
data conflict would arise. If one prior distribution 
results in a larger prior probability of there being 
a prior-data conflict than another, then it seems rea- 
sonable to say that the first prior is more informative 
than the second. In fact, a completely noninforma- 
tive prior should never produce prior-data conflicts. 

So we compare the distribution of P2(to) when 
to ~ M\t, to the distribution of Pi (to) when to ~ 
Mit, and do this in a way that is relevant to the 



prior probability of obtaining a prior-data conflict. 
One approach to this comparison is to select a 7- 
quantile x 7 S [0, 1] of the distribution of Pi (to), and 
then compute the probability 

(4) Mi T (P 2 (t )<x 7 ). 

The value 7 is presumably some cutoff, dependent 
on the application, where we will consider that evi- 
dence of a prior-data conflict exists whenever 
Pi (to) < 7- Of course, if m* T (to) has a continuous 
distribution when to ~ Mxt, then x 7 = 7. Our basic 
criterion for the weak informativity of H2 relative 
to IIi will then be that (4) is less than or equal 
to x 7 . This implies that the prior probability of ob- 
taining a prior-data conflict under II2 is no greater 
than when IIi is used, at least when we have iden- 
tified 111 as our correct prior. 

Definition 1. If (4) is less than or equal to x 7 , 
then II2 is weakly informative relative to H± at level 7. 
If II2 is weakly informative relative to IIi at level 7 
for every 7 < 70 , then II2 is uniformly weakly infor- 
mative relative to II 1 at level 70. If II2 is weakly in- 
formative relative to IIi at level 7 for every 7, then II2 
is uniformly weakly informative relative to IL\. 

Typically we would like to choose a prior II2 that 
is uniformly weakly informative with respect to ITi . 
This still requires us to select a prior from this class, 
however, and for this we must choose a level 7. 

Once we have selected 7, the degree of weak infor- 
mativity of a prior II2 relative to IIi can be assessed 
by comparing Miy(P2(to) < sc 7 ) to x 7 via the ratio 

(5) 1 - M 1T (P 2 (t Q ) < x y )/x 7 . 

If II2 is weakly informative relative to IIi at level 7, 
then (5) tells us the proportion of fewer prior-data 
conflicts we can expect a priori when using II2 rather 
than IIi. Thus, (5) provides a measure of how much 
less informative H2 is than ITi at level 7. So, for ex- 
ample, we might ask for a prior II2 that is uniformly 
weakly informative with respect to IIi and then, 
for a particular 7, select a prior in this class such 
that (5) equals 50%. 

As we will see in the examples, it makes sense to 
talk of one prior being asymptotically weakly infor- 
mative at level 7 with respect to another prior in the 
sense that (4) is bounded above by 7 in the limit as 
the amount of data increases. In several cases this 
simplifies matters considerably, as an asymptotically 
weakly informative prior is easy to find and may still 
be weakly informative for finite amounts of data. 
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While (4) seems difficult to work with, the fol- 
lowing result is proved in the Appendix and gives 
a simpler expression. 

Lemma 1. Suppose Pi(t) has a continuous dis- 
tribution under for i = 1,2. Then there exists r 7 
such that MiT(P2(t) < 7) = Myr(m 2T (t) < r 7 ), and 
H2 is weakly informative at level 7 relative to Hi 
whenever MiT(m,2 T (t) < r 7 ) < 7. Furthermore, II2 
is uniformly weakly informative relative to II 1 if and 
only if M\T{m2 T {t) < m^^o)) < M 2 T(m 2T (t) — 
m 2r(*o)) for every t . 

Note that the equivalent condition for uniform 
weak informativity in Lemma 1 says that the prob- 
ability content, under M±t, in the "tails" (regions 
of low density) of the density m\ T is always bounded 
above by the probability content under M 2 t- So M 2 y 
puts more probability content into these tails than 
M\t an d this can be taken as an indication that M2T 
is more dispersed than Myr- Lemma 1 typically ap- 
plies when we are dealing with continuous distribu- 
tions on X. It can also be shown that Pi(t) has a con- 
tinuous distribution under Mir if and only if m* T (t) 
has a continuous distribution under M^r- 

3. DERIVING WEAKLY INFORMATIVE 
PRIORS 

We consider several examples of families of priors 
that arise in applications. These examples support 
our definition of weak informativity and also lead 
to some insights into choosing priors. The results 
obtained for the examples in this section are com- 
bined in Section 4.2 to give results for a practically 
meaningful context. 

We first note that, while we could consider com- 
paring arbitrary priors II2 to LTi , we want II2 to 
reflect at least some of the information expressed 
in IIi. The simplest expression of this is to require 
that II2 have the same, or nearly the same, location 
as IIi. This restriction simplifies the analysis and 
seems natural. 

3.1 Comparing Normal Priors 

Suppose we have a sample x = (x\, . . . , x n ) from 
a N(fi, 1) distribution where [i is unknown. Then t = 
T(x) = x ~ N(n, 1/n) is minimal sufficient and sin- 
ce T is linear, there is constant volume distortion 
and so this can be ignored. Suppose that the prior IIi 
on n is a N(nQ,af) distribution with uq and a\ 
known. We then have that Myr is the N(ho, 1/n + 
a 2 ) distribution. Now suppose that II2 is a N(/j,q, a 2 ) 



distribution with a\ known. Then M2T is the N(/j,q, 
1/n + a 2 ) distribution and 

P 2 (t ) = M 2T (m* 2T (t) < m* 2T (t )) 

= M 2T (m 2T (t) < tozt(*o)) 

= M 2T ((t - ho? > (to - Ho?) 

= l-G 1 ((t - f i ) 2 /(l/n + a 2 2 )), 

where G k denotes the Chi-squared(A:) distribution 
function. Now under Myr we have that (to — Ho) 2 / 
(1/n + a 2 ) ~ Chi-squared(l). Therefore, 

M 1T (P 2 (t )< 7 ) 

= M 1T (1 - G 1 ((t - Ho? /(1/n + a 2 )) < 7) 

(6) 

(to ~ "o) 2 ^ 1/n + <4 



Myr 



> 



1/n + crf ~ 1/n + erf 

We see immediately that (6) will be less than 7 if 
and only if a 2 > o~\. In other words, H2 will be uni- 
formly weakly informative relative to LTi if and only 
if IT2 is more diffuse than LTi . Note that M\T(P 2 {to) < 
7) converges to as a 2 — >• 00 to reflect noninforma- 
tivity. Also, as n — > 00, then (6) increases to 1 — 
G\((a\/ o\ )Gj^ 1 (1 — 7)). So we could ignore n and 
choose a 2 conservatively based on this limit, to ob- 
tain an asymptotically uniformly weakly informative 
prior, as we know this value of a 2 will also be weakly 
informative for finite n. 

If we specify that we want (5) to equal p £ [0, 1], 
then (6) implies that of = (1/n + al)(G 1 1 (l - 7 + 
P7) / G x l (1 — 7)) — 1/n. Such a choice will give a pro- 
portion p fewer prior-data conflicts at level 7 than 
the base prior. This decreases to o~ 2 G l l (l — 7 + 
pj) I G~[ l (1 — 7) as n — > 00 and so the more data 
we have the less extra variance we need for II2 for 
weak informativity. 

We can generalize this to t ~ Nk(fi, n _1 I) with Ilj 
given by u ~ A^(ao,Xi)- Note we have that MiT is 
the A^(uo) n~ l I + £j) distribution. It is then easy to 
see that P 2 (*o) = l-G k ((to~ (n~ l I + ^ 2 y 1 (to- 
Mo)) and 

Mi T (P 2 (io)<7) 

(7) = M lT ((to - HoYin^I + ^Hto - ho) 

>G^(1-j)). 

Note that (7) increases to the probability that (to — 
/ u ) / S 2 _1 (*o-A t o) > G , / r 1 (l-7),whent ~A r /t(/Uo,Si), 
as n — > 00. This probability can be easily computed 
via simulation. 
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The following result is proved in the Appendix. 

Theorem 1 . For a sample of n from the statis- 
tical model {Nk(n,I) ■ /i € R k }, a Nk(no,T,2) prior is 
uniformly weakly informative relative to a Nk((io,T<i) 
prior if and only if £2 — £1 is positive semidefinite. 

The necessary part of Theorem 1 is much more dif- 
ficult than the k = 1 case and shows that we cannot 
have a N/.([j,o, £2) prior uniformly weakly informa- 
tive relative to a iV^^o, £1) prior unless £2 > £i- It 
follows from Theorem 1 that a N^(fJ,Q, £2) prior is 
uniformly weakly informative relative to a Nk(no, £1) 
prior if and only if a N(a t no,a t T,2a) prior is uni- 
formly weakly informative relative to a iV(a*/io, a*£ia) 
prior for every a G -R fc . 

For the choice of £2 we have that, if £1 and £2 are 
arbitrary k x k positive definite matrices, then r£2 > 
£1 whenever r > Afc(£i)/Ai(£2) where Aj(£) deno- 
tes the ith ordered eigenvalue of £. Note that this 
condition does not require that the £j have the same 
eigenvectors. When they do have the same eigenvec- 
tors, so £j = QDiQ' is the spectral decomposition 
of £j, then £2 > £1 whenever Aj(£2) > Aj(£i) for 
i = 1, . . . ,k. 

3.2 Comparing a t Prior with a Normal Prior 

It is not uncommon to find t priors being sub- 
stituted for normal priors on location parameters. 
Suppose x = (x±, . . . , x n ) is a sample from a N(fi, 1) 
distribution where fL is unknown. We take IIi to be 
a N(no,af) distribution and II2 to be a ii(/^0) 2 > 
distribution, that is, t±(fj,o, A) denotes the dis- 
tribution of fiQ + o~2 z with z distributed l-di- 
mensional i distribution with A degrees of freedom. 
We then want to determine o\ and A so that the 
ti(no, cj|, A) prior is weakly informative relative to 
the normal prior. 

We consider first the limiting case as n — > 00. The 
limiting prior predictive distribution of the minimal 
sufficient statistic T(x) = x is N(ftQ,af) while ^2(^0) 
converges in distribution to 1 — -H^a^o — Mo) 2 / "!) 
where -Hi a is the distribution function of an Fix 
distribution. This implies that (4) converges to 1 — 
Gi{{o~2l o~i)H^\(\ — 7)) and this is less than or equal 

to 7 if and only if a 2 2 /af > G^(l - j)/H~{(l - 7)). 
So to have that 1I2 is asymptotically weakly infor- 
mative relative to IIi at level 7, we must choose a\ 
large enough. Clearly we have that II2 is asymptot- 
ically uniformly weakly informative relative to IIi if 
and only if 

al/al>K{\)= sup G?(l - l)/H^{{l - 7). 

7 €[0,1] 

In Figure 1 we have plotted K(X) against log(A). 
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Fig. 1. Plot of K(\) against log(A) where a £i(^o, o~%, A) 
prior is asymptotically uniformly weakly informative relative 
to a iV(/io,tfi) prior if and only if o~\l(j\ > K(\). 

Since K(l) = 0.6366, we require that 2 > 
a\ (0.6366) for a Cauchy prior to be uniformly weakly 
informative with respect to a iV(/io,0 2 ) prior. 
A ti(no, 02,3) prior has variance 30 2 . If we choose 2 
so that the variance is o~f, then o\ja\ = 1/3. Since 
this is less than K{2>) = 0.8488, this prior is not 
uniformly weakly informative. A ti(/Uo,0 2 ,3) prior 
has to have variance at least equal to (2.5464)0 2 if 
we want it to be uniformly weakly informative rela- 
tive to a -/V(/io,0 2 ) prior. This is somewhat surpris- 
ing and undoubtedly is caused by the peakedness of 
the t distribution. Note that K{\) — > 1 as A — > 00, 
so this increase in variance, for the t prior over the 
normal prior, decreases as we increase the degrees 
of freedom. 

The situation for finite n is covered by the follow- 
ing result proved in the Appendix. 

Theorem 2. For a sample of n from the statis- 
tical model {N(/j,, 1) : \x G -R 1 }, a ii(^o> 02> ^) prior is 
uniformly weakly informative relative to a A r i(/xo,0 2 ) 
prior whenever a\ > 0q„, where o\ n is the unique 
solution of (1/n + 2 )- 1/2 = / °°(l/n + 0^/n)" 1 / 2 • 
kx(u)du with k\ the Gamma rate (A/2, A/2) density. 
Further, o^ n ja\ increases to 

K(\) — mn ^(1-7) _2 r 2 ((A + l)/2) 
(8) A (A)- sup —j— - = - — 

76[0 ,i] #-^(1 - 7) A T^A/2) 

as n — > 00 and so a ii(/io,0 2 ,A) prior is asymptot- 
ically uniformly weakly informative if and only if 
o\jo\ is greater than or equal to (8). 
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Fig. 2. PZof of (4) versus 7 /or ii(0,cr|,3) priors relative 
to a N(0, 1) prior when n = 20, where a\ is chosen to match 
variances (thick solid line), match the MAD (dashed line), just 
achieve uniform weak informativity (dotted line), just achieve 
asymptotic uniform weak informativity (dash-dot line), and 
equal to 1 (long- dashed line). 

Theorem 2 establishes that we can conservatively 
use (8) to select a uniformly weakly informative t 
prior. 

In Figure 2 we have plotted the value of (4) that 
arises with ii(0,of,3) priors, where a 2 is chosen in 
a variety of ways, together with the 45-degree line. 
A uniformly weakly informative prior will have (4) 
always below the 45-degree line, while a uniformly 
weakly informative prior at level 70 will have (4) 
below the 45-degree line to the left of 70 and possibly 
above to the right of 70. For example, when a\ = 
1/3, then the ii(0,cr|,3) prior and the N(0, 1) prior 
have the same variance. We see that this prior is only 
uniformly weakly informative at level 70 = 0.0357 
and is not uniformly weakly informative. 



matrix £2 and z has a A:-dimensional t distribution 
with A degrees of freedom. This is somewhat more 
complicated than the normal case, but we prove the 
following result in the Appendix which provides suf- 
ficient conditions for the asymptotic uniform weak 
informativity. 

Theorem 3. When sampling from the statisti- 
cal model {A r fc(//, I) : \i € R h }, a tfc(/Uo, £2> A) prior is 
asymptotically uniformly weakly informative relative 
to a A r fc(^o 5 Si) prior whenever £2 — t 2 T,± is posi- 



tive semidefinite, 

r 2 / fc (A/2). 



where r? 



(2/\)Y 2 l k {{k + \)/2)/ 



In contrast with Theorem 1, we do not have an 
equivalent characterization of the uniform weak in- 
formativity of multivariate t priors in terms of the 
marginal priors of a'/i. For example, when k = 2, 
then t\ = \ and when k = 1, then r\ = 2T 2 ((X + 1) /2) / 
Ar 2 (A/2) < 1 for all A. Therefore, a'£ 2 a - {2r 2 ((A + 
l)/2)/Ar 2 (A/2)}a%a > for all a does not imply 
that £2 — £1 is positive semidefinite, for example, 
take £ 2 = + 2r 2 ((A + l)/2)/Ar 2 (A/2))/2. 

For the choice of £2 we have that, if £1 and £2 
are arbitrary k x k positive definite matrices, then 
r£ 2 > r|£i whenever r > r|A fc (S 1 )/Ai(S 2 ). When 
the £j have the same eigenvectors, then £2 > t?£i 
whenever Aj(£2) > r|Aj(£i) for i = 1, . . . , k. 

3.3 Comparing Inverse Gamma Priors 

Suppose now that we have a sample x — (x \ , • • • , 3?n) 
from a iV(0,cr 2 ) distribution where cr 2 is unknown. 
Then t = T{x) = {x\ + • • • + x^)jn is minimal suffi- 
cient and T ~ Gamma ra t e (n/2, n/2a 2 ). Now suppose 
that we take IT to be an inverse gamma prior on a 2 , 
namely, a~ 2 ~ Gamma ra t e (ai, A). From this we get 
that ctjT '/ 'fa ~ F(n,2ai) and, since Jt(x) = (4x'x/ 

oc 



Note that (5) converges to 1 - Gi^Ar 2 )^ j(l - n )-V2 = (^/n)" 1 ^ m* Tn (t) = m iT>n {t) ■ {At/n) 1 ' 2 

t (n-i)/2^ + nt /2(3 i )- n / 2 - a i, which implies 

P hn (t Q ) = M lT ^ n ~ 1 ^ 2 (l+nt/2p i r n l 2 ^ 

<4^ 1)/2 (l + nt /2ft)^' /2 - ai )- 



7))/7 as n — > 00, and setting this equal to p implies 
that 0-3 = cr 2 G7/ 1 (l — j + jp) / H^I{1 — 7) which con- 
verges, as A 00, to the result we obtained in Sec- 
tion 3.1. So when A = 3,7 = 0.05 and p = 0.5, we 
must have a\lo\ = 5.0239/10.1280 = 0.49604. 

Our analysis indicates that one has to be careful 
about the scaling of the t prior if we want to say that 
the t prior is less informative than a normal prior, 
at least when we want uniform weak informativity. 

Consider now comparing a multivariate t prior to 

a multivariate normal prior. Let tk(no, £2; A) denote 

1/2 

the fc-dimensional t distribution given by /io + £ 2 z, 
where T^J 2 is a square root of the positive definite 



We want to investigate the weak informativity of 
a Gamma Ta te(a2,ft) prior relative to a Gamma rate (ai , 
Pi) prior. For finite n this is a difficult problem, so 
we simplify this by considering only the asymptotic 
case. When the prior is IT, then, as n — > 00, we have 



that m,iT, n (t) -> mi T {t) = (/3f* /T{oti))t 
that is, 1/t ~ Gamma rate (aj, /3j) 
Therefore, P 2 , n (t ) -> P 2 (*o) = n 2 (t 
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Fig. 3. Plot of (a, /3) corresponding to Beta(a, /3) priors that are weakly informative at level 7 = 0.05 (light and dark shading) 
and uniformly weakly informative (light shading) for n = 20 (on the left), n = 100 (middle) and n = 00 (on the right). 



t Q ° 2 1 • e - ^ 2 /* ) and we want to determine condi- 
tions on (02,^2) so that IIi (i-*2 < 7) < 7- 

While results can be obtained for this problem, it 
is still rather difficult. It is greatly simplified, how- 
ever, if we impose a natural restriction on (02,^2)- 
In particular, we want the location of the bulk of the 
mass for II2 to be located roughly in the same place 
as the bulk of the mass for IIi . Accordingly, we could 
require the priors to have the same means or modes, 
but, as it turns out, the constraint that requires the 
modes of the m* T functions to be the same greatly 
simplifies the analysis. Actually, m* Tn (t) converges 
to 0, but the re's cancel in the inequalities defin- 
ing -Pj, n (£o) and so we can define rn* Tn (t) = 
j.-cti-i/2 e -Pi/t w hi c h h as it s mode at t = /3i/(ati + 
1/2). Therefore, we must have (32/ {012 + 1/2) = f3\j 
(ai + 1/2) so that (0,2, P2) lies on the line through 
the points (0, /3i/2(«i + 1/2)) and (ai,p±). We prove 
the following result in the Appendix. 

Theorem 4. Suppose we use a Gamma ra t e {ot\ , 
(3\) prior on 1/c 2 when sampling from the statistical 
model {-/V(0, a 2 ) : a 2 > 0} . Then a Gamma ra t e (a2 ) P2) 
prior on I /a 2 , with (3 2 /{a 2 + 1/2) = /V(«i + V 2 ); 
is asymptotically weakly informative relative to the 
Gamma Ta te(«i, /3i) prior whenever 02 < a± and /3 2 = 
/5i(«2 + l/2)/(«i + 1/2) or, equivalently, whenever 
Pi/2( ai + 1/2) < p 2 < Pi and a 2 = («i + l/2)/3 2 /£i - 
1/2. 

Of particular interest here is that we cannot re- 
duce the rate parameter P2 arbitrarily close to and 
be guaranteed asymptotic weak informativity. 

4. APPLICATIONS 

We consider now some applications of determining 
weakly informative priors. 



4.1 Weakly Informative Beta Priors for the 
Binomial 

Suppose that T~Binomial(n, 9) and #~Beta(a, f3). 
This implies that rar(t) = (™)r(a + (3)T(t + a)T(n — 
t + /3)/r(a)r(/3)r(n + a + $) and from this we can 
compute (4) for various choices of (ct,/3). 

As a specific example, suppose that n = 20, the 
base prior is given by (a,/3) = (6,6), and we take 
7 = 0.05 so that xo.05 = 0.0588. As alternatives to 
this base prior, we consider Beta(a, [3) priors. In Fig- 
ure 3 we have plotted all the (a, (3) corresponding 
to Beta(o, /?) distributions that are weakly informa- 
tive with respect to the Beta(6, 6) distribution at 
level 0.05, together with the subset of all (a, (3) cor- 
responding to Beta(a, (3) distributions that are uni- 
formly weakly informative relative to the Beta(6, 6) 
distribution. The graph on the left corresponds to 
n = 20, the middle graph corresponds to n = 100, 
and the graph on the right corresponds to n = 00. 
The plot for n = 20 shows some anomalous effects 
due to the discreteness of the prior predictive distri- 
butions and these effects disappear as n increases. 
In such an application we may choose to restrict 
to symmetric priors, as this fixes the primary loca- 
tion of the prior mass. For example, when n = 20, 
a Beta(a, a) prior for a satisfying 1 < a < 12.3639 
is uniformly weakly informative with respect to the 
Beta(6, 6) prior and we see that values of a > 6 are 
eliminated as n increases. 

4.2 Weakly Informative Priors for the Normal 
Regression Model 

Consider the situation where y ~ N n (X (3 , a 2 1) , 
X G R nxk is of rank k and (3 G R k ,a 2 > are un- 
known. Therefore, T = (6, s 2 ) with b = {X'X)~ l X'y 
and s 2 = \\y — Xb\\ 2 . Suppose we have elicited a prior 
on ((3,o- 2 ) given by 1/cr 2 ~ Gamma ra t e (ai, Ti), and 



WEAK INFORMATIVITY 



9 



(3\a 2 ~ iVfc(/3o, o" 2 ^i)- We now find a prior that is 
asymptotically uniformly weakly informative rela- 
tive to this choice. For this we consider gamma pri- 
ors for l/o" 2 and t priors for /3 given a 2 . For the 
asymptotics we suppose that Xk((X' X)^ 1 ) — > as 
n — > oo. 

As discussed in Evans and Moshonov (2006, 2007), 
it seems that the most sensible way to check for 
prior-data conflict here is to first check the prior 
on a 2 , based on the prior predictive distribution 
of s 2 . If no prior-data conflict is found at this stage, 
then we check the prior on /3 based on the condi- 
tional prior predictive for b given s 2 , as s 2 is ancil- 
lary for /3. Such an approach provides more infor- 
mation concerning where a prior-data conflict exists 
than simply checking the whole prior via (3). 

So we consider first obtaining an asymptotically 
uniformly weakly informative prior for I /a 2 . We 
have that s 2 \a 2 ~ Gamma rate ((n — k)/2, (n — k) /2a 2 ) 
and so, as in Section 3.3, when 1/a 2 ~ Gamma ra ,t e (aj, 
Tj), the limiting prior predictive distribution of 1/s 2 
is Gamma ra t e (aj, rj) as n— >oo. Furthermore, when 
T 2 (x) = s 2 , then J Ta (x) = (4s 2 / (n -k))~ l f 2 . There- 
fore, the limiting value of (4) in this case is the same 
as that discussed in Section 3.3 and Theorem 4 ap- 
plies to obtain a Gamma rate (a2> 72) prior asymptot- 
ically uniformly weakly informative relative to the 
Gamma ratc (ai,Ti) prior. 

If we consider s 2 as an arbitrary fixed value from 
its prior predictive distribution, then, when (3\a 2 ~ 
iVfc(/3o, o" 2 Si), the conditional prior predictive distri- 
bution of b given s 2 converges to the Nk(f3o, s 2 £i) 
distribution. Furthermore, when j3\a 2 ~ ifc(/3o) ^ 2 ^2, 
A) , the conditional prior predictive distribution of b 
given s 2 converges to the tk(Po, s 2 T,2, A) distribution. 
So we can apply Lemma 1 to these limiting distribu- 
tions. It is then clear that the comparison is covered 
by Theorem 3, as the limiting prior predictives are of 
the same form. Therefore, the tk(Po, & 2 Y<2, A) prior 
is asymptotically uniformly weakly informative rel- 
ative to the iVfc(/3o, ) prior whenever s 2 S 2 > 
s 2 t 2 Xi or, equivalently, whenever S2>t|Si where r? 
is defined in Theorem 3. Note that this condition 
does not depend on s 2 . Also, as A — > 00, we can use 
Theorem 2 to obtain that a Nk(f3o,a 2 T,2) prior is 
asymptotically uniformly weakly informative rela- 
tive to the Nk(f3o, o" 2 Si) prior whenever £2 > Si. 

4.3 Weakly Informative Priors for Logistic 
Regression 

Supposing we have a single binary valued response 
variable Y and k quantitative predictors X\ , . . . , Xj, , 



Table 1 



Dose (g/ml) 


Number of animals rii 


Number of deaths ti 


0.422 


5 





0.744 


5 


1 


0.948 


5 


3 


2.069 


5 


r> 



we observe (Y,Xi,... , X^) at q settings of the pre- 
dictor variables and have rii observations at the ith 
setting of the predictors. The logistic regression mo- 
del then says that Yij ~ Bernoulli (pi) where log(pi/ 

(1 - p^) = (3 + Pi(xn - x.i) H h Pk{xik ~ x. k ) 

for j = 1, . . . , rii and i = 1, . . . , q and the /3j are un- 
known real values. For simplicity, we will assume no 
Xij — x.j is zero. For this model T = (T\ , . . . , T q ) , with 

Ti = Yn H h Yi ni , is a minimal sufficient statistic. 

For the base prior we suppose that TIi is the product 
of independent priors on the /3j 's and we consider the 
problem of finding a prior 1I2 that is weakly informa- 
tive relative to LTi . For example, we could take LTi to 
be a product of iV(0, cr 2 ,) priors and H2 to be a prod- 
uct of Af(0,o"!j) priors and choose the cr^ so that 
weak informativity is obtained. Note that since T is 
discrete we can use (2) in our computations. 

As we will see, it is not the case that choosing 
the a\ { very large relative to the a\ { will necessarily 
make II2 weakly informative relative to IIi. In fact, 
there is only a finite range of a\ values where weak 
informativity will obtain. 

While this can be demonstrated analytically, the 
argument is somewhat technical and it is perhaps 
easier to see this in an example. The following bioas- 
say data are from Racine et al. (1986) and were also 
analyzed in Gelman et al. (2008). These data arise 
from an experiment where 20 animals were exposed 
to four doses of a toxin and the number of deaths 
recorded (Table 1). 

Following Gelman et al. (2008), we took X\ to 
be the variable formed by calculating the logarithm 
of dose and then standardizing to make the mean 
of X\ equal to and its standard deviation equal 
to 1/2. Gelman et al. (2008) placed independent 
Cauchy priors on the regression coefficients, namely, 
/3 ~ti(0,10 2 ,l) independent of ft ~ ti(0, 2.5 2 , 1). 

We consider four possible scenarios for the in- 
vestigation of weak informativity at level 7 = 0.05 
and uniform weak informativity. In Figure 4(a) we 
compare II2 = Af(0, <7q) x A^(0,<t 2 ) priors with the 
prior IIi = JV(0, 10 2 ) x JV(0, 2.5 2 ). The entire region 
gives the (ctq, o\) values corresponding to priors that 
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Fig. 4. Weakly informative II2 priors relative to II 1 at level 0.05 (light and dark shading) and uniformly weakly informative 
(light shading) where (&) IIi = iV(0,10 2 ) x iV(0,2.5 2 ) and U 2 = N(0, erg) x N{0,af), (b) Ui = ti(0, 10 2 , 1) x ti(0,2.5 2 ,l) and 
n 2 =ti(0,cro,l) x ti(0,cr 2 ,l), (c) ni = A"(0,10 2 ) x iV(0,2.5 2 ) and n 2 = ii(0, erg, 1) x ti(0,cr 2 ,l) and (d) Ui = h(0, 10 2 , 1) x 
ti(0,2.5 2 ,l) anrfn 2 = A r (0,o-o) x iV(0,cr 2 ). 



are weakly informative at level 7 = 0.05, while the 
lighter subregion gives the (cro, (J\) values correspond- 
ing to priors that are uniformly weakly informative. 
Note that some of the irregularity in the plots is 
caused by the fact that the prior predictive distri- 
butions of T are discrete. The three remaining plots 
are similar where in Figure 4(b) ITi = ii(0, 10 2 , 1) x 
ti(0, 2.5 2 , 1) and U 2 = ti(0,a 2 , 1) x ^(O,^ 2 , 1), in Fig- 
ure 4(c) IIi = JV(0, 10 2 ) x N(0, 2.5 2 ) and n 2 = *i(0, 
crg,l)xti(0,cr 2 ,l), and in Figure 4(d) U 1 = t 1 (0,10 2 , 
1) xti(0,2.5 2 ,l) and U 2 = N(0, af) x N(0, a\). Note 
that these plots only depend on the data through the 
values of X\ . 



We see clearly from these plots that increasing the 
scaling on any of the /3j does not necessarily lead to 
weak informativity and in fact inevitably destroys 
it. Furthermore, a smaller scaling on a parameter 
can lead to uniform weak informativity. These plots 
underscore how our intuition does not work very 
well with the logistic regression model, as it is not 
clear how priors on the ultimately translate to 
priors on the pj. In fact, it can be proven that, if we 
put independent priors on the /3j, fix all the scalings 
but one, and let that scaling grow arbitrarily large, 
then the prior predictive distribution of T converges 
to a distribution concentrated on two points, for ex- 
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Fig. 5. Reduction levels of N(0,<tq) x N(0,o~i) relative to 
JV(0, 10 2 ) x Af(0,2.5 2 ) priors using (5) when 7 = 0.05. The 
plotted reduction levels are 0% (solid line), 25% (dashed line), 
50% (dotted line) and 75% (long dashed line). 

ample, when the scaling on (3q increases these points 
are given by {£f =1 T, = 0} U {£!=i T % = £f =1 n*}, 
and this is definitely not desirable. This partially 
explains the results obtained. 

Of some interest is how much reduction we ac- 
tually get, via (5), when we employ a weakly in- 
formative prior. In Figure 5 we have plotted con- 
tours of the choices of (o"o,o"i) that give 0%, 25%, 
50% and 75% reduction in prior-data conflicts for 
the case where Ii 2 =N(0, (Tq) x N(0,af) and H"i = 
iV(0,10 2 ) x iV(0,2.5 2 ) when 7 = 0.05 (this corre- 
sponds to x-y = 0.0503). Note that a substantial re- 
duction can be obtained. 

We can also consider fixing one of the scalings and 
seeing how much reduction we obtain when varying 
the other. For example, when we fix gq = 2.5 we find 
that the maximum reduction is obtained when a\ is 
close to 2.2628, while if we fix a\ = 2.5, then the 
maximum reduction is obtained when <to is close 
to 0.875. 

It makes sense in any application to check to see 
if any prior-data conflict exists with respect to the 
base prior. If there is no prior-data conflict, this 
increases our confidence that the weakly informa- 
tive prior is indeed putting less information into 
the analysis. This is assessed generally using (3), 
although (2) suffices in this example. When IIi = 
iV(0,10 2 ) x iV(0,2.5 2 ), then (2) equals 0.1073 and 
when IIi = *i(0, 10 2 , 1) x ^(0, 2.5 2 , 1) (the prior used 
in German et al., 2008), then (2) equals 0.1130, so 



in neither case is there any evidence of prior-data 
conflict. 

5. REFINEMENTS BASED UPON 
ANCILLARITY 

Consider an ancillary statistic that is a function of 
the minimal sufficient statistic, say, U(T). The vari- 
ation due to U(T) is independent of 6 and so should 
be removed from the P-value (3) when checking for 
prior-data conflict. Removing this variation is equiv- 
alent to conditioning on U (T) and so we replace (3) 
by 

(9) M T (m* T (t)<m* T (t )\U(T)), 

that is, we use the conditional prior predictive given 
the ancillary U(T). To remove the maximal amount 
of ancillary variation, we must have that U(T) is 
a maximal ancillary. Therefore, (4) becomes 

(10) M 1T (P 2 (t Q \U(T)) < x^\U(T)), 

that is, we have replaced i-^o) by P2(to\U(T)) = 
M 2 T{m* 2T (t) < m* 2T (t )\U(T)) and M 1T by 
M lT (-\U(T)). 

We note that the approach discussed in Section 2 
works whenever T is a complete minimal sufficient 
statistic. This is a consequence of Basu's Theorem, 
as, in such a case, any ancillary is statistically inde- 
pendent of T and so conditioning on such an ancil- 
lary is irrelevant. This is the case for the examples 
in Sections 3 and 4. 

One problem with ancillaries is that multiple max- 
imal ancillaries may exist. When ancillaries are used 
for frequentist inferences about 6 via conditioning, 
this poses a problem because it is not clear which 
maximal ancillary to use and confidence regions de- 
pend on the maximal ancillary chosen. For checking 
for prior-data conflict via (9), however, this does not 
pose a problem. This is because we simply get differ- 
ent checks depending on which maximal ancillary we 
condition on. For example, if conditioning on maxi- 
mal ancillary U± (T) does not lead to prior-data con- 
flict, but conditioning on maximal ancillary U?(T) 
does, then we have evidence against no prior-data 
conflict existing. 

Similarly, when we go to use (10), we can also 
simply look at the effect of each maximal ancillary 
on the analysis and make our assessment about H2 
based on this. For example, we can use the maxi- 
mum value of (10) over all maximal ancillaries to 
assess whether or not H2 is weakly informative rel- 
ative to ITi. When this maximum is small, we con- 
clude that we have a small prior probability of find- 
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Fig. 6. Plot of all (a, (3) corresponding to Beta(a,/3) priors 
that are weakly informative at level 7 = 0.05 (light and dark 
shading) and uniformly weakly informative (light shading). 

ing evidence against the null hypothesis of no prior- 
data conflict when using II2. We illustrate this via 
an example. 

Example 1. Suppose that we have a sample 
of n from the Multinomial(l, (1 - 0)/6, (l+0)/6, (2- 
0)/6, (2 + 0)/6) distribution where £ [— 1, 1] is un- 
known. Then the counts (/1, /2, /3, fi) constitute a mi- 
nimal sufficient statistic and U\ = (/1 + /2, + /a) 
is ancillary, as is U 2 = (/1 + /4, h + Z^)- Then T = 
h, h, U)\Ux is given by /i| f/i ~ Binomial(/i + 
/2, (1 — 0)/2) independent of /3I J7i ~ Binomial(/3 + 
/ 4 ,(2-0)/4), giving 

m T (fi, h, h, hWi) 

h+h\ (h+h 
h )\ h 

h 



1 



1 + 



/2 /2_ 



h 



2 + 



./4 



TT(9)d9. 



We then have two 1-dimensional distributions f\ \ U\ 
and fz\U\ to use for checking for prior-data conflict. 
A similar result holds for the conditional distribu- 
tion given U2- 

For example, suppose ir is a Beta(20, 20) distribu- 
tion on [—1,1], so the prior concentrates about 0, 
and for a sample of n = 18 we have that U± = f± + 
/2 = 10 and U2 = fi + /4 = 8. In Figure 6 we have 
plotted all the values of that correspond to 



a Beta(a,/3) prior that is weakly informative rela- 
tive to the Beta(20, 20) prior at level 7 = 0.05, as 
well as those that are uniformly weakly informative. 
So for each such (a, (3) we have that (10) is less than 
or equal to 0.05 for both U = U\ and U = 1/2- 



6. CONCLUSIONS 

We have developed an approach to measuring the 
amount of information a prior puts into a statisti- 
cal analysis relative to another base prior. This base 
prior can be considered as the prior that best re- 
flects current information and our goal is to deter- 
mine a prior that is weakly informative with respect 
to it. Our measure is in terms of the prior predic- 
tive probability, using the base prior, of obtaining 
a prior-data conflict. This was applied in several 
examples where the approach is seen to give intu- 
itively reasonable results. The examples chosen here 
focused on commonly used prior families. In several 
cases these were conjugate families, although there 
is no special advantage computationally to conju- 
gacy in this context. 

As noted in several examples, we need to be care- 
ful when we conceive of a prior being weakly infor- 
mative relative to another. Ultimately this concept 
needs to be made precise and we feel our definition 
is a reasonable proposal. The definition has intu- 
itive support, in terms of avoiding prior-data con- 
flicts, and provides a quantifiable criterion that can 
be used to select priors. 

In any application we should still check for prior- 
data conflict for the base prior using (3). If prior- 
data conflict is found, a substitute prior that is weak- 
ly informative relative to the base prior can then 
be selected and a check made for prior-data con- 
flict with respect to the new prior. While selecting 
the prior based on the observed data is not ideal, 
this process at least seems defensible from a logical 
perspective. For example, the new prior still incor- 
porates some of the information from the base prior 
and is not entirely driven by the data. Certainly, in 
the end it seems preferable to base an analysis on 
a prior for which a prior-data conflict does not exist. 
Of course, we must still report the original conflict 
and how this was resolved. 

We have restricted our discussion here to proper 
priors. The concept of weak informativity is obvi- 
ously related to the idea of noninformativity and im- 
proper priors. Certainly any prior that has a claim to 
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being noninformative should not lead to prior-data 
conflict. At this time, however, there is no precise 
definition of what a noninformative prior is, whereas 
we have provided a definition of a weakly informa- 
tive prior. In the examples of Section 3.1 and 3.2 we 
see that if the spread of II2 is made large enough, 
then II2 is uniformly weakly informative with re- 
spect to the base prior. This suggests that the flat 
improper prior, which is Jeffreys' prior for this prob- 
lem, can be thought of as always being uniformly 
weakly informative. The logistic regression example 
of Section 4.3 suggests caution, however, in inter- 
preting increased diffuseness as a characterization 
of weak informativity. In the binomial example of 
Section 4.1 the uniform prior is always weakly in- 
formative with respect to the base prior, while the 
Beta(l/2, 1/2) (Jeffreys') prior is not. Further work 
is required for a full examination of the relationships 
among the concepts of prior-data conflict, noninfor- 
mativity and weak informativity. 

APPENDIX 

PROOF of Lemma 1. We have that x 1 = 7 since 
P\ (t) has a continuous distribution under M\t ■ Sup- 
pose m* T (t) has a point mass at tq when t ~ Mit- 
The assumption MiT(m* T (t) = ro) > implies 
(7B* T ) _1 {r } 7^ 0. Then, pick t ro G ("i* T ) _1 { r o} so 
that m* T (t ro ) = ro and let rji = p(i ro ). Then, Pi(t) 
has point mass at rji because Mix(Pi(t) = r?j) > 
M iT (m* T {t) = m* T (t ro )) = M iT (m* T {t) = r ) > 0. 
This is a contradiction and so m* T {t) has a con- 
tinuous distribution when t ~ Mit- 

Let r-y = sup{r G TZ : M 2 T(rri2 T (t) — r ) — 7) where 
TZ = {rri2 T (t) : t G T} and T is the range space of T. 
Then, M 2T (m* 2T (t) < r 7 ) = 7 and M 2T (m* 2T (t) < 
r 7 + e) > 7 for all e > 0. Thus, we have that {t : 
P 2 (*) < 7} = {t-m* T (t) < r 7 }, M 1T (P 2 (t) < 7) = 
M\T(m 2T {t) — r i)i an d II2 is weakly informative at 
level 7 relative to IIi if and only if MiT(m 2T (t) < 
^7) < 7- The fact that {r 7 :7 G [0,1]} C TZ implies 
the last statement. □ 

Proof of Theorem 1. Suppose first that Si < 
Y>2- We have that n~ l I + Si < n~ l I + S2 and so 
(n-^+Si)" 1 > (n- 1 / + S 2 )~ 1 . This implies that (7) 
is less than 7 and so the Nk(no, S2) prior is uniform- 
ly weakly informative relative to the Nk(no, Si) prior. 

For the converse put Vi = {y : + Sj) _1 y < 

1}. If V\ C V 2 , then for y G P fe \{0} there exists c > 
such that c 2 y'(n _1 I + Si)~ x y = 1 which implies 
C V G V 2 and so c 2 y' (n^ 1 1 + T, 2 )~ 1 y < 1. This implies 



that y'(n _1 I + Si)" 1 ?/ > y'(n~ l I + T< 2 )~ 1 y and so 
Si < S2 and the result follows. If V 2 C V±, then the 
same reasoning says that S2 < Si and (7) would be 
greater than 7 if S2 < Si. 

So we need only consider the case where V\ Pi V 2 , 
Vf n V 2 both have positive volumes, that is, we are 
supposing that neither S2 — Si nor Si — S2 is pos- 
itive semidefinite and then will obtain a contradic- 
tion. Let 5 = miiy'in^I + Hx)~ 1 y:y G Vy n dV 2 } 
and note that 5 < 1, since V°C\dV 2 7^ <fi, that is, there 
are points in the interior of V\ on the boundary of V 2 . 
Now put V = {y G V 1 nV 2 c :y'(n- 1 I + T, 1 )- 1 y< (1 + 
S) /2} and note that Vo has positive volume. 

Let r ~JV fc (0,n -1 I + £i) and r 2 = G^(1 -7). 
Then Mi T (Pi (t) < 7) = P{Y'{n~ l I + Si)" 1 ^ > r 2 ) = 
P(Y $ T 7 Vi) = l-Py(T 7 (V!nV 2 )UT 7 (yinK> c )) while 

M 1T {P 2 {t) < 7) = por'tn- 1 / + S2)" 1 • y > r 7 2 ) = 

P(y r 7 1/ 2 ) = 1 - Py(r 7 (Vl D V 2 ) U r 7 (V\ c n V 2 )). 
Since 7 = Mir(Pi(t) < 7), we need only show that 
P Y (t 7 (Vi n Kf)) > Py(Ty(V{ n V2)) for all 7 suffi- 
ciently small, to establish the result. 

Let f(x) = k\e~ x l 2 be such that f{y'{n~ 1 I + 
Si)" 1 ?/) is the density of Y. Then iV(r 7 (Vf D V 2 )) = 

Vol((F 1 c n V2))r 7 where = argminjy' • 
(n^ 1 / + Si) _1 y : y G Vf D V 2 }. Note it is clear that 
y* G dVi and so y'*{n~ l I + Si)~V = 1 and /(r 2 yi • 
(n^J+Si)- 1 ^) = A;ie- T '/ 2 . Also, Py (r 7 (Vi nV 2 c )) > 
Py(r 7 Vb) = L y /(y / (n _1 J + Si)- 1 ? y)d2/>/(r 2 (l + 



<5)/2) Vol(^ )r 7 fc where /(r 2 (l + 5)/2) = kie~ T ' 
Therefore, as 7 — > 0, 

Py(r 7 (Vlny 2 c )) 

iV(r 7 (^ c n^ 2 )) 

> 2( i_ 5)/4 VoU0£T1^ 

Vol(Vb) 



(l+5)/4 



00, 



since r 7 = (G fc - 1 (l- 7 )) 1 / 2 
<5<1. □ 



00 as 7 — > and < 



PROOF of Theorem 2. First note that we can 
use (2) instead of (3) in this Jt(x) is constant 

in this case. We assume without loss of generality 
that fXQ = 0. 

We first establish several useful technical results. 
If Ilj is a probability distribution that is unimodal 
and symmetric about 0, and <j) v denotes a A r (0, v) 
density, we have that rriiT{t) = f R cp u (t — 
is unimodal and symmetric about 0. We have the 
following result. 
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Lemma A.l. IfT is a minimal sufficient statistic, 
Jt{x) is constant in x, Ii\ and II2 are unimodal 
and symmetric about 0, the Pi(t) have continuous 
distributions when t ~ Mjr,miT(0) > 171 2 t(0), and 
miTit) = rn 2 T{t) has a unique solution for t > 0, then 
Il~2 is uniformly weakly informative relative to Hi. 

Proof. By the unimodality and symmetry 
of mi?, we must have that Pi{t) = MiT{miT{u) < 
m iT (t)) = M iT (\u\ > \t\). We show Mi T (\t\ > t ) < 
M 2 T(\t\ > to) for all to > because it is equivalent 
to II2 being uniformly weakly informative relative 
to IIi by Lemma 1. Let t s be the solution of m\T(t) = 
m 2 T(t) on (0, 00). From the unique solution assump- 
tion, mixit) > m2T(t) for t € (0,t s ) and m\T{t) < 
m 2T (t) for t > t s . For < t < t s ,M 1T (\t\ > t ) = 
2f™m 1T (t)dt = 1 - 2jl°m 1T {t)dt < 1 - 
2 / *° m 2T (t) dt = 2 /~ m 2T (t) dt = M 2T (\t\ > t ) and 
for t > t s , M 1T \\t\ > t ) = 2f™m 1T (t)dt < 
2 m 2 T(t) dt, = M 2 T(\t\ > to) - Thus, we are done. □ 

We can apply Lemma A.l to comparing normal 
and t priors when sampling from a normal. 

Lemma A. 2. Suppose we have a sample of n 
from a location normal model, II 1 is a N(0,a 2 ) prior 
and IL 2 is a ti(0,a 2 ,X) prior. If mix (0) >m 2 T(0), 
then H 2 is uniformly weakly informative relative to II 1 . 

PROOF. We have that mix = 4>i/ n +a 2 an d' us ~ 
ing the representation of the t(A) distribution as 
a gamma mixture of normals, we write m 2 T{t) = 
Io° ( / ) i/n+a%/u(t)k*( u ) du where k\ is the density of 
Gamma ra te(A/2, A/2) distribution. By the symme- 
try of 4> v , m 2 T is symmetric. Also, (j) v (ti) > 4> v (t 2 ) for 
0<ti<t 2 and so m 2T (ti) = f(f>i/ n+a %/ u ( t l) k >>( u ) du ^ 
I ^i/n+al/u^kxiu) du = m 2T (t 2 ). Thus, m 2T is de- 
creasing on (0,oo), that is, m 2 T is unimodal. To 
show that m 2 T(t) is log-convex with respect to t 2 , 
we prove that (d 2 /d(t 2 ) 2 ) log m 2T (t) > 0. Note that 
{d/d{t 2 ))(f> v {t) = (d/d(t 2 ))[(2Trv)~ 1 / 2 exp{-t 2 /2v}} = 
-<t> v {t)/2v, 

dm 2T (t) 



dt 2 



h/n+ ^ /u{t) -k x (u)du, 



2(l/n + a 2 /u) 



d 2 log m 2 T(t) 



d(t 2 y 



r°° , , w 

-^ 7 -—^k x (u)du 



<l>l/n+**/u(t) , . \ 2 
-k\(u) du 



m 2T (t) 2 \J 2(l/n + a%/u) 

and so (t 2 logm 2 T(t)Mt 2 ) 2 = Var v ([2(l/n + 
^l/^)] 1 ) — 0' where V is the random variable hav- 
ing density <p 1/n+a 2 /v (t)k x (v)/m 2T (t). Thus, m 2T (t) 

is log-convex in t 2 . 

The functions mir(t) and m 2 T{t) meet in at most 
two points on (0, 00) because logmiT(t) is linear 
in t 2 and logm2r(t) is convex in t 2 . Also, mrr(i) 
and m 2 T{t) share at least one point on (0, 00) be- 
cause 771it(0) > m 2 T(0), and the following shows 
that mir(t) < m 2 T{t) for all large t. Note first that 
if u > a\l2a\, then (1/n + cr 2 )/(l/n + a 2 /u) > 1/2 
and t 2 /{u/n + al) > (2a 2 /a^)t 2 /(l/n + 2af). Then, 

m 2T (t) 
myr{t) 

roc (A/2 )A/2 ( 27 r(l/n + t T 2 /n))-V 2 A/2 _ 1 
) a y 2al T(A/2) (2^(l/n + ( x 2 ))-i/2 

exp{-(u/2)(A + t 2 /{u/n + a 2 2 ))} 



> 



exp{-(l/2)t 2 /(l/n + a 2 )} 



du 



> 



(A/2)V 2 1 ( <j 2 2 \ X/2 - 1 
T(A/2) 2V2 V 2a 2 



exp 



A ( y , ^\la 2 )t 2 
2j[ A+ l/n + 2a 2 



exp 



:i/2)t 2 



[l/n + al 



-1 



du 



(A/2) A / 2 1 

r(A/2) 2V2 

{20- 2 



/ a 2 \ A/2-1 

■ {-\) ex P {-(l/2)(<7 2 2 /2<7 2 )A} 



• exp 



+ 4 



2) \n 
2-MA+ 



+ 2<t 2 



(1/n + 2a 2 ; 



00 



m 2T {t) Jo [2(l/n + a 2 /u) 



as t — >■ 00. 

The above conditions together imply that mir(t) 
and m 2 T(t) meet in exactly one point on (0, 00). 
Therefore, 112 is uniformly weakly informative rela- 
tive to 111 by Lemma A.l. □ 

Since J* °°(l/n + a 2 /u)~ l / 2 k\{u) du is strictly de- 
creasing in a 2 , we see that miT(0) = (27r(l/n + 
^i 2 ))- 1/2 > m 2T (0) = (2vr)" 1 /2 f™ {1/n + <%/ u )-W ■ 
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k\ (u) du is equivalent to a 2 > &o n where con satis- 
fies (1/n + a 2 )' 1 / 2 = / °°(l/„ + a 2 n /u)~ 1/2 kxiu) d«. 
This proves the first part of Theorem 2. 

We also need the following results for the remain- 
ing parts of Theorem 2. 

Lemma A. 3. (i) o~Q n /a 2 increases as na\— >oo, 
(ii) alJal -> (2/A)r 2 ((A + l)/2)/r 2 (A/2) as 
na\ — > 00. 

PROOF, (i) We have n _1 / 2 (l/n + a\y x l 2 = 
n -1 / 2 J °°(l/n-|-(Jo n /it) _1 ^ 2 fcA('w) du and putting a = 
naf,(3 = naQ n , we can write this as 

/>oo 

(A.l) (l + a)" 1/2 = / (l + f3/ U y 1/2 k x (u)du. 
Jo 

Differentiating both sides of (A.l) with respect to a, 
we have (1 + a)" 3 / 2 = / °°(1 + 0/ 

uJ-^u-^A^JduCd/S/da). If we let U ~ 
Gamma ra te(A/2, A/2), then this integral can be writ- 
ten as the expectation 

Edi + p/uy^u- 1 ) 

= £((l+/?/£/)- 3/2 (/3/t/ + l-l)//?) 
= (3- 1 E{{l + p/U)- 1 ' 2 ) 

-/r^a+zwr 372 ) 

K^EHl + p/U)- 1 ' 2 ) 

-^{Edl + P/U)- 1 / 2 )} 3 

= /3- 1 (l + «)- 1/2 -/3- 1 (l + a)- 3/2 

= (l + a)- 3 / 2 (a//3), 

where the inequality follows via Jensen's inequal- 
ity. Hence, d/3/da = (1 + a)~ 3 / 2 [£((l + /3/U)~ 3 / 2 ■ 
U )] > P/a and so (3 /a is an increasing function 
of a because d(f3/a)/da = a~ 1 {df3 / da) — /3/a 2 > 0. 
This proves o-\ n jo\ = no~Q n /no~ 2 = (3/a increases as 
a = na\ — > 00. 

(ii) It is easy to check that (3 = when a = and 
(3 > for a > 0. Let ao, (3$ be a pair satisfying ao > 
and (A.l). Then, /3/a > (3o/ao > for a > «o and 
/5 — >• 00 as a — > 00. Therefore, 

lim f— = lim 

a^ooyaj a-M» ^/l + a 

= isf ^ ) 

\^l + (3/Uj 

= lim e( - = \ =E{VU) 



= / y/uk\ (u) du 
Jo 

= (2/A) 1 / 2 r((A + l)/2)/r(A/2) 

and this proves (ii). □ 

Lemma A. 4. Suppose we have a sample of n 
from a location normal model, is a N(0,o~ 2 ) prior 
and H 2 is a ti(0,a 2 , A) prior. Then U 2 is asymptot- 
ically uniformly weakly informative relative to Hi if 
and only if aj/aj > (2/A)r 2 ((A + l)/2)/r 2 (A/2). 

Proof. Suppose that a\/a\ > (2/A)r 2 ((A + 1)/ 
2)/r 2 (A/2). Then by Lemma A.3 of/of > alJa\ 
for all n and so II2 is uniformly weakly informa- 
tive with respect to IIi for all n. So (4) is bounded 
above by 7 for all n and so the limiting value of (4) 
is also bounded above by 7. This establishes that n 2 
is asymptotically uniformly weakly informative rel- 
ative to III. 

Suppose now that a\/a\ < (2/A)r 2 ((A + l)/2)/ 
T 2 (A/2). Note that mrr(t) = lirm^oo miT, n (t) = 
(27ra 2 )- 1 /2 exp (-t 2 /(2 C j 2 )) ' and m 2T {t) = 
lim ri ^ 0O m 2rin (t) = r((A + l)/2)/(r(A/2)77rAa 2 )(l + 
x 2 /(a|A))-( A+1 )/ 2 . Therefore, we get m lT (0) = 
l/y/tor<% < T((A + l)/2)/r(A/2)0rAof = m 2T (0). 
Let B = {t:m 2T (t) > m 1T (0)} and 7 = M 2T (B°). 
Then, mix(i) < ^it(O) < m 2 T{t) ° n -E? an d 
M 1T (P 2 (t) < 7) = M 1T {B C ) = 1 - M 1T (B) = 1 - 
/b miT{t) dt > 1 — j B mrr(0) dt > I — J B m 2 T(t) dt = 
M 2 t(B c ) = 7. Hence, n 2 is not weakly informative 
relative to Hi at level 7. Therefore, cr^/cr 2 > (2/ 
A)r 2 ((A + l)/2)/r 2 (A/2). □ 

It is now immediate that sup 76 r u G 1 ~ 1 (l — 7)/ 
- 7) = (2/A)r 2 ((A + l)/2)/r 2 (A/2) and the 
proof of Theorem 2 is complete. □ 

PROOF of Theorem 3. Since the minimal suf- 
ficient statistic T{x) = x is linear, there is no volume 
distortion and we can use (2) instead of (3). The lim- 
iting prior predictive distribution of T(x) = x under 
Hi is iV(/io,Si) and under U 2 it is tki^o,^, A). It 
is easy to check that U\ = (T — ^qYT,^ 1 ^ — /xo) ~ 
X 2 (k) when T ~ U x and U 2 = (T-^to/Sj 1 (T-fj lQ ) ~ 
kFfrx when T~ n 2 . This implies that -P 2 , n (^o) con- 
verges toP 2 (*o) =n 2 (vr 2 (t) <7r 2 (t )) = l-H htX ((to- 
HoYYt^ito — fio)/k), where Hk t \ is the distribution 
function of an \ distribution. Further, we have 
that (4) converges to Hi(P 2 (i) < 7). 

Let Vt = {u € R k : u'Y>T l u < 1} for i = 1,2. By the 
continuity of n 2 (7r 2 (t) < r) function of r, and 
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the continuity of ^(t), there exists to such that 
Pity) < 7 if and only if 7r 2 (i) < ^(to)- Hence, IT2 
is asymptotically uniformly weakly informative rela- 
tive to 111 if and only if IIi(7r2(i) < 7r 2 (to)) < 
112(^2 (t) < vr2(to)) for all to 6 R k by Lemma 1. Since 
7T2(t) is decreasing in U2 = U2(t), the set {^(i) < 
vi"2(to)} = {u2(t) > u 2 (t )} = fi + u 2 (to)V 2 c . So we 
must prove that Il^o + r l / 2 V 2 c ) < U 2 (fi + r l / 2 V 2 c ) 
for all r > 0. 

The positive semidefiniteness of £ 2 — ffSi implies 
that E^ 1 / 7 ". 2 — S2 1 is positive semidefinite. Then, 
for u € Vo, that is, u'T, 2 u — 1 > we have u'X^u = 
r| • u^E^/t^Ju > r|u'S2 \ > t 2 . Thus, V£ C T X Vf. 

Now we prove a stronger inequality Ili(^o + 
r 1/2 r x ■ Vf) < n 2 (^ + r x l 2 V 2 c ) for all r > 0. Note 
that 

= U 1 (u 1 (t)>rr 2 ) 

'■00 9— k/2 



of the theorem, 



-11 

Irrl r(fc/2) 

n2( M + r 1 / 2 v 2 c ) 

= n 2 Mt)>r) 

/•°° r((fe + A)/2) 
7 r/fc r(fc/2)r(A/2) 



i- 7/ \-( fc + A )/ 2 
i + yl d« 



and set /(r) = n 2 (/i + r l / 2 V 2 c ) - li^o + r 1 / 2 ^ 
Vf). Then, /(0) =0 and 

d/(r) 2" fe / 2 



dr 



r(fc/2) 



(rT 2 )fe /2-l e -^/2 T 2 



r((fc + A)/2) fk\ k/2 fr\ k/2 ~ 1 

r(fc/2)r(A/2) UJ W 



•(l + r/A) 



-(fc+A)/2 



1 



= (^/2) fc/2 fe/2-1 -rr?/2 

r(*/2) 

r((fc + A)/2) kn k/2 -i 
r(fc/2)r(A/2) 

•(l+r/A)^ +A )/ 2 

Note that p\ — P2 > is equivalent to p\/p2 > 1- Fur- 
ther recalling the definition of r 2 from the statement 



El ^(A/2) /A\ fc / 2 r/A) ( fe+ A)/2 

P2 r((fe + A)/2) V 2; 1 + /AJ 

• exp(-rr|/2) 
= (1 + r/A)( fc+A )/ 2 exp(-rr|/2) > 1. 

The logarithm of pi/p2 given by log(pi/p2) = ~ rT \/ 
2 + ((& + A)/2) log(l + r/A) is concave as a function 
of r > 0. Hence, log(pi/p2) = has exactly two solu- 
tions: r = and r = r s . Because of its concavity, the 
function log(pi/p2) is positive on (0, r s ) and nega- 
tive on (r s ,oo). This implies that f(r) is increasing 
on (0, r s ) and decreasing on (r s ,oo). Since /(0) = 
and linv-^oo f(r) = 0, the function / is nonnega- 
tive, that is, f(r) > for all r > 0. Thus, Hi(/io + 
r x l 2 V 2 c ) < UiQto + r x / 2 T X V{) < n 2 (/i + r x l 2 V 2 c ) for 
all r > 0. □ 

Proof of Theorem 4. Let x~ x = f3\/{a\ + 
1/2) = (5 2 /{a 2 + 1/2). For i = 1,2, let t,(t ) = 1/ 
(aVi(£o)) be the two solutions of m 2T (ij) = m 2T (to) 
(one of the tj equals to) so < r\ < 1 < r 2 . Note that 
r 2(to) = 1 if and only if to = x~ l and then r±(to) = 1 
as well. Then, log(ri/r2) = r± — r 2 and dr\/dr2 = 
(r 2 - l)ri/[(n - l)r 2 ]. Now {t : m* T (t) < m* T {t )} = 
{t:l/t < x c ri{to) or 1/t > s c r2(to)}- By Lemma 1 
we have that uniform weak informativity is equiv- 
alent to Mi T (m* 2T {t) < m* 2T (t Q )) < M 2T {m* 2T (t) < 
m 2T (to)) for all to and so we must prove that M\x{t ^ 
(*2(to),ti(to))) = M 1T (l/t < x c n(t ) or 1/t > 
xMto)) = 1 - M lT (x c n{to) < 1/t < x c r 2 (t )) < 1 - 
M2T{x c fi(to) <l/t< x c T2{to)) for all to- Since r\ is 
implicitly a function of r 2 , it is equivalent to prove 
that M\t{x c vi < 1/t < x c T2) — M 2 t{x c t\ < 
1/t < x c T2) > for all T2 > 1. Using {r\/r2) OL = 
exp(a(ri —r-i)), we have that the derivatives of the 
two terms are given by 

dr 2 



Pi 



c^x^T^e-^^Xc 
- Cl (x c r 1 ) ai - 1 e- ftXcri x c ^ 



r 2 — 1 

1 rexp((r 2 - ri)(/3ix c - a<i)) 



n — 1 

Q2"l - 
2 e 

r 2 -l 

n — 1 



p 2 = C2X «2 r2 ' 2 - 1 e- /32Xcr2 



exp((r 2 - r 1 )(f3 2 x c - a 2 )) 
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where q = ftf* /Y{oii). Then, recalling the definition 
of x c , we have that the ratio p\/p 2 = (ci / c 2 )x^ 1 ~ a2 • 
r aa.-*a e (p a -h)x c n = ( Cl /c 2 )x^~ a2 {r 2 e~ r2 ) ai ^ a2 
strictly decreases as r 2 increases from 1 to oo when 
cl\ > a 2 because a\ — a 2 = (Pi — /3 2 )x c > 0, and is 
identically 1 when a± = a 2 . Suppose then that a\ > 
a 2 so there is at most one r 2 value where p\ = p 2 
and the derivative is 0. If {pi/p2)\r 2 =i < 1> then p\ — 
p 2 < for all r 2 >l and M\t[x c t\ < 1/t < x c r 2 ) — 
M 2 t(x c ti < 1/t < x c r 2 ) strictly decreases from 0. 
This cannot hold because M\T(x c r\ <l/t < x c r 2 ) — 
M 2 T(x c ri < 1/t < x c r 2 ) — > as r 2 — > oo. Hence, (p\/ 
P2)\r 2 =i > 1 an d M 1T (x c r 1 < 1/t < x c r 2 ) - 
M 2 T(x c ri <l/t< x c r 2 ) increases from near r 2 = 1 
and decreases to as r 2 — > oo. Therefore, Mix(x c ri < 
1/t < x c r 2 ) — M 2 t(x c t\ < 1/t < x c r 2 ) goes up from 
and down to as r 2 increases from 1 to oo, and 
we have M\t (x c r% <l/t<x c r 2 ) — M 2 t [x c r\ < 1/t < 
x c f 2 ) > for all r 2 > 1. □ 
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